Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Over three percent of people carry a dominant pathogenic variant, yet only a fraction of carriers develop disease. Disease phenotypes from carriers of variants in the same gene range from mild to severe. Here, we investigate underlying mechanisms for this heterogeneity: variable variant effect sizes, carrier polygenic backgrounds, and modulation of carrier effect by genetic background (marginal epistasis). We leveraged exomes and clinical phenotypes from the UK Biobank and the Mt. Sinai BioMeBiobank to identify carriers of pathogenic variants affecting cardiometabolic traits. We employed recently developed methods to study these cohorts, observing strong statistical support and clinical translational potential for all three mechanisms of variable carrier penetrance and disease severity. For example, scores from our recent model of variant pathogenicity were tightly correlated with phenotype amongst clinical variant carriers, they predicted effects of variants of unknown significance, and they distinguished gain- from loss-of-function variants. We also found that polygenic scores modify phenotypes amongst pathogenic carriers and that genetic background additionally alters the effects of pathogenic variants through interactions.more » « lessFree, publicly-accessible full text available December 1, 2026
-
Traditional models of supervised learning require a learner, given examples from an arbitrary joint distribution on 𝑅 𝑑 × { ± 1 } R d ×{±1}, to output a hypothesis that competes (to within 𝜖 ϵ) with the best fitting concept from a class. To overcome hardness results for learning even simple concept classes, this paper introduces a smoothed-analysis framework that only requires competition with the best classifier robust to small random Gaussian perturbations. This subtle shift enables a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (multi-index model) and (2) has bounded Gaussian surface area. This class includes functions of halfspaces and low-dimensional convex sets, which are only known to be learnable in non-smoothed settings with respect to highly structured distributions like Gaussians. The analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, the authors present the first algorithm for agnostically learning intersections of 𝑘 k-halfspaces in time 𝑘 ⋅ poly ( log 𝑘 , 𝜖 , 𝛾 ) k⋅poly(logk,ϵ,γ), where 𝛾 γ is the margin parameter. Previously, the best-known runtime was exponential in 𝑘 k (Arriaga and Vempala, 1999).more » « lessFree, publicly-accessible full text available April 30, 2026
-
Azar, Yossi; Panigrahi, Debmalya (Ed.)We provide the first analysis of (deferred acceptance) clock auctions in the learning-augmented framework. These auctions satisfy a unique list of very appealing properties, including obvious strategyproofness, transparency, and unconditional winner privacy, making them particularly well-suited for real-world applications. However, early work that evaluated their performance from a worst-case analysis perspective concluded that no deterministic clock auction with n bidders can achieve a O (log1-∈ n ) approximation of the optimal social welfare for a constant ∈ > 0, even in very simple settings. This overly pessimistic impossibility result heavily depends on the assumption that the designer has no information regarding the bidders’ values. Leveraging the learning-augmented framework, we instead consider a designer equipped with some (machine-learned) advice regarding the optimal solution; this advice can provide useful guidance if accurate, but it may be unreliable. Our main results are learning-augmented clock auctions that use this advice to achieve much stronger performance guarantees whenever the advice is accurate (known as consistency), while maintaining worst-case guarantees even if this advice is arbitrarily inaccurate (known as robustness ). Our first clock auction achieves the best of both worlds: (1 + ∈ )-consistency for any desired constant ∈ > 0 and O (log n ) robustness; we also extend this auction to achieve error tolerance. We then consider a much stronger notion of consistency, which we refer to as consistency∞ and provide an auction that achieves a near-optimal trade-off between consistency∞ and robustness. Finally, using our impossibility results regarding this trade-off, we prove lower bounds on the “cost of smoothness,” i.e., on the robustness that is achievable if we also require that the performance of the auction degrades smoothly as a function of the prediction error.more » « lessFree, publicly-accessible full text available January 28, 2026
-
Free, publicly-accessible full text available January 1, 2026
-
We study the problem of PAC learning γ-margin halfspaces with Massart noise. We propose a simple proper learning algorithm, the Perspectron, that has sample complexity O˜((ϵγ)−2) and achieves classification error at most η+ϵ where η is the Massart noise rate. Prior works [DGT19,CKMY20] came with worse sample complexity guarantees (in both ϵ and γ) or could only handle random classification noise [DDK+23,KIT+23] -- a much milder noise assumption. We also show that our results extend to the more challenging setting of learning generalized linear models with a known link function under Massart noise, achieving a similar sample complexity to the halfspace case. This significantly improves upon the prior state-of-the-art in this setting due to [CKMY20], who introduced this model.more » « lessFree, publicly-accessible full text available January 16, 2026
-
Free, publicly-accessible full text available December 1, 2025
-
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)In the strategic facility location problem, a set of agents report their locations in a metric space and the goal is to use these reports to open a new facility, minimizing an aggregate distance measure from the agents to the facility. However, agents are strategic and may misreport their locations to influence the facility’s placement in their favor. The aim is to design truthful mechanisms, ensuring agents cannot gain by misreporting. This problem was recently revisited through the learning-augmented framework, aiming to move beyond worst-case analysis and design truthful mechanisms that are augmented with (machine-learned) predictions. The focus of this prior work was on mechanisms that are deterministic and augmented with a prediction regarding the optimal facility location. In this paper, we provide a deeper understanding of this problem by exploring the power of randomization as well as the impact of different types of predictions on the performance of truthful learning-augmented mechanisms. We study both the single-dimensional and the Euclidean case and provide upper and lower bounds regarding the achievable approximation of the optimal egalitarian social cost.more » « lessFree, publicly-accessible full text available December 10, 2025
-
Free, publicly-accessible full text available December 1, 2025
-
A fundamental notion of distance between train and test distributions from the field of domain adaptation is discrepancy distance. While in general hard to compute, here we provide the first set of provably efficient algorithms for testing localized discrepancy distance, where discrepancy is computed with respect to a fixed output classifier. These results imply a broad set of new, efficient learning algorithms in the recently introduced model of Testable Learning with Distribution Shift (TDS learning) due to Klivans et al. (2023).Our approach generalizes and improves all prior work on TDS learning: (1) we obtain universal learners that succeed simultaneously for large classes of test distributions, (2) achieve near-optimal error rates, and (3) give exponential improvements for constant depth circuits. Our methods further extend to semi-parametric settings and imply the first positive results for low-dimensional convex sets. Additionally, we separate learning and testing phases and obtain algorithms that run in fully polynomial time at test time.more » « lessFree, publicly-accessible full text available December 10, 2025
-
Free, publicly-accessible full text available June 3, 2026
An official website of the United States government
